About

I have been extremely lucky to work in many different realms, including as a Journalist at the New York Times, data scientist at the Johns Hopkins Data Science Lab and Dealer.com in Vermont, and “data artist in residence” at startup Conduce in California.

Currently, I am a PhD candidate in biostatistics at Vanderbilt University. I received my undergraduate degree from the University of Vermont where I majored in mathematics and statistics and minored in computer science.

My research focuses on unsupervized and semi-supervized methods for exploring large omics datasets, with a heavy focus on visualization.

I like data. Manipulating it, modeling it, making it (simulation), visualizing it and yes, even cleaning it. I do these things with some combination of R, Python, and Javascript (d3.js in particular). Most recently I have been fascinated with conveying complex statistical topics and methods using intuitive and interactive graphics.

If you want to read about me and my pursuits in a more eloquently written form, I was recently profiled by my alma mater.

When I am not in “school mode” I love to bike places, read science fiction, take photos, and wander around gardens/musuems.

Have a fantastic day!

Projects

For a much more up-to-date and topical list of my work, check out the data science/statisics/visualization blog that I run with Lucy D’Agostino McGowan: Live Free or Dichotomize.

Data Visualization Best Practices in R

Intermediate level course exploring visualization best practices in R.
Uses ggplot2 and the tidyverse packages.


Shinysense

A set of shiny modules for letting shiny sense the world around it.
Currently has touch, sound, motion, and vision 'senses.'
Bundled into an R package.


What are P-Values, Really?

A resource for explaining what statistical significance really means.
Storified to try and make it memorable.
Takes the form of a reproducible r-markdown document so others can recreate


Making Nice Looking Websites Using RMarkdown

A walkthrough from start to finish of making a website using RMarkdown and hosting it on Github.
Made in collaboration with Lucy McGowan
Presented at the statistical computing workshop for Vanderbilt Medical Center
See sample site here


Conditional Survival Curves on Truncated Survival Data

A visual exploration of Kaplin-Meier survival curves on left-truncated survival data.
Drag the conditional slider to see how the survival curve changes depending on the age of entry.
All logic for K-M curve written from scratch in javascript and much more performant than the survival package in R.
For more information on the algorithm to generate a K-M curve see the wikipedia page.


Reusable Statistics Plots in D3

Also see my histogram made in the same way.
My first attempts at making a d3 library.
Ultimately will be tied with a companion R app for interactive visualization for statisticians.
Uses the reusable d3 structure proposed by Elliot Bentely.


What's In Season?

An interactive exploration of what produce is in season.
Data scraped from here using python.
Allows the user to select different in season ingredients and search for recipes containing them.
Notebooks for scraping in github repo.


Data Visualization In R

Rmarkdown document for a statistical computing workshop I gave at Vanderbilt.
A brief overview of some common visualization mistakes and code to fix them in ggplot
Provides an overview of some newer visualization tools.


Binomially Distributed Fun!

Demonstrates how a sequence of independent Bernoulli Trials make up the Binomial Distribution.
Allows the user to toggle the parameters of the Bernoulli and generate a samples.
Calculates and displays a 95% confidence interval and wilson hypothesis test based upon the generated data.
All statistics funtions are written from scratch in vanilla javascript.


The Likelihood Function

An interactive exploration of the likelihood function.
Visually explains the concepts of support intervals and likelihood ratios.
Allows the user to input their own data for creating figures for reports/presentations.


Confidence Intervals Explained

Allows the user to explore what a frequentist confidence interval truly is.
To many people, including the scientists who use them, the behavior of Confidence Intervals is confusing.
All statistics functions are written from base javascript. See github repo for code.


Probability Integral Transformations

Made in an effort to visualize what happens when you transform a probability distribution with a function.
Uses the normal distribution transformed by the normal cdf, resulting in a uniform distribution. See here for more info.
Inspired by my course work in Probability at Vanderbilt.


Where Are Wildfires Burning?

Uses open data from NASA satelites on global temperature anomalies.
Fresh data is downloaded every day and pushed to the static page via shell scripts avoiding the need for servers.
Data source.


Interactive Manhattan Plot R Package.

An R package to generate interactive and embedable manhattan plots for genome wide association studies.
Binds R and Javascript + D3 using the HTMLWidgets package.


State Farmers Market Profiles.

Companion visualization to What Do Farmer's Markets Sell?
Explore different states path's through different metrics relating to farmers markets.
Uses equal sized states map as menu to reduce bias associated with normal projections.
Data courtesy of Data.gov.


What Do Farmer's Markets Sell?

Select different good types (e.g. Vegetables, Fruit) and see which markets sell them.
Assemble different combinations of goods to explore regional trends.
Dynamic layout adjusts to mobile or desktop views.
Be patient with it, more than eight thousand points are being drawn to the screen. It will bog down older phones/computers.
Data courtesy of Data.gov


Interactive Manhattan Plot Viewer.

Developed as an experiment in exploratory data visualization.
Select different controls for comparison, e.g. non-dominant arm growth to see linked snps.
A manhattan plot is a commonly used tool in accessing genetic roots for traits
Uses data from the FAMuSS study ( Thompson Et Al. 2004).


Learn ASL numbers with Leap Motion.

First place project at 2014 UVM CS Fair.
Teaches numbers 0-9 in American Sign Language.
Utilizes three.js and webGL for rendering.
Built to exploit multiple HCI and Cognitive Psychology theories (e.g. object consistancy and the generation effect) in order to maximize learning experience.


Experimental Leap Motion + D3.js project.

Wave your hands around and watch D3.js mirror you!
Requires a leap motion device.
In the future I plan on implementing ways to interact with D3 visualizations by recognizing gestures using machine learning algorithms.
a href='https://www.youtube.com/watch?v=yttEEA-Gd2A'>Video of it in action for if you don't have a Leap.
strong>Note: When using, start by waving your hands around above the Leap Motion device and watch it calibrate!


Polio's impact on the United States.

A project for Data Science 2 (Math 295) taught by Professor James Bagrow at the University of Vermont.
iPython notebook and data files available on my github.


labinthewild.org Interactive Visualization.

A visualization developed for LabInTheWild at the University of Michigan to help participants place themselves among differing demographics.


Alternative energy filling stations in the U.S..

Using d3.hexbin I took took 18k+ data points and binned them to help explore geographic trends in alternative energy filling stations.


Where does California get its energy?

A visualization that explores how electricity is generated in the state of California. Data was cleaned using python and then the visualization was generated using d3.js.


CV

Expand C.V.

Contact

I am always interested in getting involved in new projects or just connecting with others. Feel free to get in touch!

email: n.strayer (at) vanderbilt (dot) edu

twitter: NicholasStrayer

github: nstrayer